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Abstract 

In this paper, we show how to manipulate syntax with binding using 
a mixed representation of names for free variables (with respect to 
the task in hand) and de Bruijn indices [5] for bound variables. By 
doing so, we retain the advantages of both representations: nam- 
ing supports easy, arithmetic-free manipulation of terms; de Bruijn 
indices eliminate the need for a-conversion. Further, we have en- 
sured that not only the user but also the implementation need never 
deal with de Bruijn indices, except within key basic operations. 

Moreover, we give a hierarchical representation for names which 
naturally reflects the structure of the operations we implement. 
Name choice is safe and straightforward. Our technology combines 
easily with an approach to syntax manipulation inspired by Huet's 
'zippers' [10]. 

Without the ideas in this paper, we would have struggled to imple- 
ment EPIGRAM [19]. Our example — constructing inductive elimi- 
nation operators for datatype families — is but one of many where it 
proves invaluable. 
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Their Representation; D.l.l [Programming Techniques]: Ap- 
plicative (Functional) Programming 
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1 Introduction 

This paper is about our everyday craft. It concerns, in particu- 
lar, naming in the implementation of systems which manipulate 
syntax-with-binding. The problems we address here are not so 
much concerned with computations within such syntaxes as con- 
structions over them. For example, given the declaration of an in- 
ductive datatype (by declaring the types of its constructors), how 
might one construct its induction principle? 

We encounter such issues all the time in the implementation of EPI- 
GRAM [19]. But even as we develop new technology to support 
programming and reasoning in advanced type systems, but we must 
handle the issues they raise effectively with today's technology. We 
work in Haskell and so do our students. When they ask us what 
to read in order to learn their trade, we tend to look blank and feel 
guilty. We want to do something about that. 

Let's look at the example of constructing an induction principle for 
a datatype. Suppose someone declares 

data Nat = Zero | Sue Nat 
We should like to synthesize some statement corresponding to 

VP <E Nat -> Prop. 
P Zero — ► 

(Vk e Nat. Pk -> P (Suck)) -► 
Vn e Nat . P n 

In a theoretical presentation, we need not concern ourselves too 
much about where these names come from, and we can always 
choose them so that the sense is clear. In a practical implemen- 
tation, we have to be more cautious — the user (innocently or other- 
wise) may decide to declare 

data Nat = Zero | P Nat or even data P = Zero | Sue P 
We'll have to be careful not to end up with such nonsense as 
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VP e Nat — ► Prop. or VP e P — > Prop. 

P Zero — ► P Zero — ► 

(Vk G Nat.Pk — > P (Pk)) -> (Vk e P.Pk — > P (Suck)) -» 

VnGNat.Pn VnGP.Pn 

Fear of shadows may seem trivial, but it's no joke — some real sys- 
tems have this bug, although it would be invidious to name names. 
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Possible alternative strategies include the adoption of one of de 
Bruijn's systems of nameless dummies [5] for the local quantifiers, 
either counting binders (including — which we take to abbreviate V 
where the bound variable isn't used) from the reference outward — 
de Bruijn indices, 

V- G Nat -> Prop. 
OZero — ► 

(V- e Nat. 20 ^3 (Sue 1)) 
V- € Nat. 30 

or from the outside inward — de Bruijn levels. 

V0 e Nat -> Prop. 
OZero -» 

(V2e Nat. 02^0 (Sue 2)) -► 
V3 e Nat. 03 

It's unfair to object that terms in de Bruijn syntax are unfit for hu- 
man consumption — they are not intended to be. Their main benefits 
lie in their uniform delivery of capture-avoiding substitution and 
their systematic resolution of (X-equivalence. Our enemies can't 
choose bad names in order to make trouble. 

However, we do recommend that anyone planning to use de Bruijn 
syntax for systematic constructions like the above should think 
again. Performing constructions in either of these systems requires 
a lot of arithmetic. This obscures the idea being implemented, re- 
sults in unreadable, unreliable, unmaintainable code, and is besides 
hard work. We, or rather our programs, can't choose good names in 
order to make sense. 

A mixed representation of names provides a remedy. In this paper, 
we name free variables (ie, variables bound in the context) so that 
we can refer to them and rearrange them without the need to count; 
we give bound variables de Bruijn indices to ensure a canonical 
means of reference where there's no 'social agreement' on a name. 

The distinction between established linguistic signs, connecting a 
signifiant (or 'signifier') with its signifie (or 'signified'), and local 
signs, where the particular choice of signifier is arbitrary was ob- 
served in the context of natural language by Saussure [6]. In formal 
languages, the idea of distinguishing free and bound variables syn- 
tactically is also far from new. It's a recurrent idiom in the work 
of Gentzen [8], Kleene [14] and Prawitz [24], The second author 
learned it from Randy Pollack who learned it in turn from Thierry 
Coquand [4]; the first author learned it from the second. 

The idea of using free names and bound indices is not new either — 
it's a common representation in interactive proof systems. This also 
comes to the authors from Randy Pollack [23] who cites the influ- 
ence of Gerard Huet in the Constructive Engine [9]. Here 'free' 
means 'bound globally in the context' and 'bound' means 'bound 
locally in the goal'. The distinction is allied to the human user's 
perspective — the user proves an implication by introducing the hy- 
pothesis to the context, naming it H for easy reference, although 
other names are, we hear, permitted. By doing so, the user shifts 
perspective to one which is locally more convenient, even though 
the resulting proof is intended to apply regardless of naming. 

What's new in this paper is the use of similar perspective shifts 
to support the use of convenient naming in constructions where 
the 'user' is itself a program. These shifts are similar in charac- 
ter to those used by the second author (with Randy Pollack) when 
formalizing Pure Type Systems [20, 21], although in that work, 



bound variables are distinguished from free variables but nonethe- 
less named. We draw on the Huet's 'zipper' technique [10] to 
help us write programs which navigate and modify the structure 
of terms. Huet equips syntax with an auxiliary datatype of struc- 
tural contexts. In our variation on his theme, we require naming as 
we navigate under binders to ensure that a structural context is also 
a linguistic context. In effect, whoever T may be, if I am involved 
in the discourse, then / am not a number — I am a free variable. 

With many agents now engaged in the business of naming, we need 
a representation of names which readily supports the separation 
of namespaces between mechanical construction agents which call 
each other and indeed themselves. We adopt a hierarchical nam- 
ing system which permits multiple agents to choose multiple fresh 
names in a notionally asynchronous manner, without fear of clash- 
ing. Our design choice is unremarkable in the light of how humans 
address similar issues in the design of large computer systems. Both 
the ends and the means of exploiting names in human discourse be- 
come no less pertinent when the discourse is mechanical. 

As the above example may suggest, we develop our techniques in 
this paper for a fragment of a relational logic, featuring variables, 
application, and universal quantification. It can also be seen as a 
non-computational fragment of a dependent type theory. We've de- 
liberately avoided a computational language in order to keep the fo- 
cus on construction, but you can — and every day we do — certainly 
apply the same ideas to A.-calculi. 

Overview 

In section 2 of this paper, we give the underlying data representa- 
tion for our example syntax and develop the key operations which 
manipulate bound variables — only here do we perform arithmetic 
on de Bruijn indices, and that is limited to tracking the outermost 
index as we recurse under binders. 

Section 3 shows the development of our basic construction and 
analysis operators for the syntax, and discusses navigation within 
expressions in the style of Huet [10]. Section 4 introduces our hi- 
erarchical technique for naming free variables in harmony with the 
call-hierarchy of agents which manipulate syntax. 

These components come together in Section 5, where we assemble 
a high-level toolkit for constructions over our syntax. Section 6 
puts this toolkit to work in a non-trivial example: the construction 
of induction principles for Epigram's datatype families [7, 15, 19]. 
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2 An Example Syntax 

Today, let us have variables, application, and universal quantifica- 
tion. We choose an entirely first-order presentation: 1 

infixl9 :$ 
infixr6 :— ► 

data Expr = F Name — free variables 

Bint — bound variables 

Expr :$ Expr — application 

Expr :— ► Scope — V-quantification 
deriving (Show.Eq) 

newtype Scope = Scope Expr deriving (Show, Eq) 

We shall define Name later — for now, let us at least presume that 
it supports the (=) test. Observe that expressions over a common 
context of free Names can meaningfully be compared with the or- 
dinary (=) test — a-conversion is not an issue. 

Some readers may be familiar with the use of nested datatypes and 
polymorphic recursion to enforce scope constraints precisely if you 
parametrize expressions by names [2, 3]. Indeed, with a depen- 
dently typed meta-language it's not so hard to enforce both scope 
and type for an object-language [1]. These advanced type systems 
can and should be used to give more precise types to the programs 
in this paper, but they would serve here only to distract readers not 
yet habituated to those systems from the implementation techniques 
which we seek to communicate here. 

Nonetheless, we do introduce a cosmetic type distinction to help 
us remember that the scope of a binder must be interpreted differ- 
ently. The Scope type stands in lieu of the precise 'term over one 
more variable' construction. For the most part, we shall pretend 
that Expr is the type of closed expressions — those with no 'dan- 
gling' bound variables pointing out of scope, and that Scope has 
one dangling bound variable, called B 0 at the top level. In order to 
support this pretence, however, we must first develop the key utili- 
ties which trade between free and bound variables, providing a high 
level interface to Scope. We shall have 

abstract :: Name — > Expr — ► Scope 
instantiate :: Expr — ► Scope — ► Expr 

The operation abstract name turns a closed expression into a scope 
by turning name into B 0. Of course, as we push this operation un- 
der a binder, the correct index for name shifts along by one. That 
is, the image of name is always the outer de Bruijn index, hence we 
implement abstract via a helper function which tracks this value. 
Observe that the existing bound variables within expr's Scopes re- 
main untouched. 

abstract :: Name — ♦ Expr — ► Scope 
abstract name expr = Sc (nameTo 0 expr) where 
nameTo outer (Fname 1 ) \ name = name' = Bouter 
| otherwise = Fname' 
nameTo outer (B index) =B index 

nameTo outer (fun :$arg) = 

nameTo outer fun :$ nameTo outer arg 
nameTo outer (dom :— » Sc body) = 

nameTo outer dom :— ► Sc (nameTo (outer + 1) body) 

l The techniques in this paper adapt readily to higher-order rep- 
resentations of binding, but that's another story. 



Meanwhile, instantiate image turns a scope into an expression by 
replacing the outer de Bruijn index (initially BO) with image, which 
we presume is closed. Of course, F name is closed, so we can use 
instantiate (Fname) to invert abstract name. 

instantiate :: Expr — ► Scope — ► Expr 
instantiate image (Sc body) = replace 0 body where 
replace outer (B index) \ index = outer = image 

| otherwise = B index 
replace outer (Fname) = Fname 

replace outer (fun :$ arg) = 

replace outer fun :$ replace outer arg 
replace outer (dom :— » Sc body) 

replace Hitler dnm ■— ► Sc (replace (outer + 1) body) 

Note that the choice of an unsophisticated de Bruijn indexed repre- 
sentation allows us to re-use the closed expression image, however 
many bound variables have become available when it is being ref- 
erenced. 

It is perfectly reasonable to develop these operations for other rep- 
resentations of bound variables, just as long as they're still kept sep- 
arate from the free variables. A de Bruijn level representation still 
has the benefit of canonical name-choice and cheap a-equivalence, 
but it does mean that image must be shifted one level when we 
push it under a binder. Moreover, if we were willing to pay for 
a-equivalence and fresh-name generation for bound variables, we 
could even use names, modifying the definition of Scope to pack 
them up. We feel that, whether or not you want to know the names 
of bound variables, it's better to arrange things so you don't have to 
care about the names of bound variables. 

Those with an eye for a generalization will have spotted that both 
abstract and instantiate can be expressed as instances of a single 
general-purpose higher-order substitution operation, parametrized 
by arbitrary operations on free and bound variables, themselves 
parametrized by outer. 

varChanger :: (Int — > Name — ► Expr) — ► 
(lnt-» Int -» Expr) -» 
Expr — > Expr 

We might well do this in practice, to reduce the 'boilerplate' code 
required by the separate first-order definitions. However, this oper- 
ation is unsafe in the wrong hands. 

Another potential optimization, given that we often iterate these 
operations, is to generalize abstract, so that it turns a sequence 
of names into dangling indices, and correspondingly instantiate, 
replacing dangling indices with a sequence of closed expressions. 
We leave this as an exercise for the reader. 

From now on, outside of these operations, we maintain the invariant 
that Expr is only used for closed expressions and that Scopes have 
just one dangling index. The data constructors B and Sc have served 
their purpose — we forbid any further use of them. From now on, 
there are no de Bruijn numbers, only free variables. 

It's trivial to define substitution for closed expressions using 
abstract and instantiate (naturally, this also admits a less succinct, 
more efficient implementation): 

substitute :: Expr — ► Name — » Expr — ► Expr 

substitute image name = instantiate image ■ abstract name 
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Next, let us see how instantiate and abstract enable us to navigate 
under binders and back out again, without ever directly encounter- 
ing a de Bruijn index. 



3 Basic Analysis and Construction Operators 

We may readily define operators which attempt to analyse expres- 
sions, safely combining selection (testing which constructor is at 
the head) with projection (extracting subexpressions). Haskell's 
support for monads gives us a convenient means to handle failure 
when the 'wrong' constructor is present. Inverting ( :$ ) is straight- 
forward: 



unapply :: MonadPlusm => Expr — ► m(Expr, Expr) 
unapply (fun :$arg) = return (fun.arg) 
unapply _ = mzero 



For our quantifier, however, we combine structural decomposition 
with the naming of the bound variable. Rather than splitting a quan- 
tified expression into a domain and a Scope, we shall extract a bind- 
ing and the closed Expr representing the range. We introduce a spe- 
cial type of pairs which happen to be bindings, rather than using 
ordinary tuples, just to make the appearance of programs suitably 
suggestive. We equip Binding with some useful coercions. 



infix 5 :6 

data Binding = Name :€ Expr 

bName :: Binding — > Name 
bName (name :e _) = name 
bVar :: Binding — > Expr 
bVar = F- bName 



Now we can develop a 'smart constructor' which introduces a uni- 
versal quantifier by discharging a binding, and its monadically lifted 
inverter: 



infixr6 — > 

( — >) :: Binding — > Expr — ► Expr 

(name :<E dom) — > range = dam:—* abstract name range 
infix < — 

(< — ) :: MonadPlusra => Name — > Expr — » m (Binding, Expr) 
name < — (dom :— ► scope) = return (name :<E dom, 

instantiate (Fname) scope) 

name < — _ = mzero 



3.1 Inspiration — the 'Zipper' 

We can give an account of one-hole contexts in the style of Huet's 
'zippers' [10]. A Zipper is a stack, storing the information required 
to reconstruct an expression tree from a particular subexpression 
at each step on the path back to the root. The operations defined 
above allow us to develop the corresponding one-step manoeuvres 
uniformly over the type (Zipper, Expr). 



infixl4 :< 

dataStackx = Empty | Stackx :< x deriving (Show, Eq) 

type Zipper = Stack Step 

data Step = Fun () Expr 
I Arg Expr() 
| Dom () Scope 
j Range Binding () 

This zipper structure combines the notions of structural and lin- 
guistic context — a Zipper contains the bindings for the names which 
may appear in any Expr filling the 'hole'. Note that we don't bind 
the variable when we edit a domain: it's not in scope. We can 
easily edit these zippers, inserting new bindings (e.g., for induc- 
tive hypotheses) or permuting bindings where dependency permits, 
without needing to renumber de Bruijn variables. 

By contrast, editing with the zipper constructed with respect to 
the raw definition of Expr — moving into scopes without binding 
variables — often requires a nightmare of arithmetic. The first au- 
thor banged his head on his Master's project [16] this way, before 
the second author caught him at it. 

The zipper construction provides a general-purpose presentation of 
navigation within expressions — that's a strength when we need to 
cope with navigation choices made by an external agency, such as 
the user of a structure editor. However, it's a weakness when we 
wish to support more focused editing strategies. In what follows, 
we'll be working not with the zipper itself, but with specific sub- 
types of it, representing particular kinds of one-hole context, such 
as 'quantifier prefix' or 'argument sequence'. Correspondingly, the 
operations we develop should be seen as specializations of Huet's. 

But hold on a moment! Before we can develop more systematic 
editing tools, we must address the fact that navigating under a 
binder requires the supply of a Name. Where is this name to come 
from? How is it to be represented? What has the former to do with 
the latter? Let's now consider naming. 

4 On Naming 

It's not unusual to find names represented as elements of String. 
However, for our purposes, that won't do. String does not have 
enough structure to reflect the way names get chosen. Choosing 
distinct names is easy if you're the only person doing it, because 
you can do it deliberately. However, if there is more than one agent 
choosing names, we encounter the possibility that their choices will 
overlap by accident. 

The machine must avoid choosing names already reserved by the 
user, whether or not those names have yet appeared. Moreover, as 
our programs decompose tasks into subtasks, we must avoid nam- 
ing conflicts between the subprograms which address them. Indeed, 
we must avoid naming conflicts arising from different appeals to the 
same subprogram. 

How do we achieve this? One way is to introduce a global symbol 
generator, mangling names to ensure they are globally unique; an- 
other approach requires a global counter, incremented each time a 
name is chosen. This state-based approach fills names with mean- 
ingless numbers, and it unnecessarily sequentializes the execution 
of operations — a process cannot begin to generate names until its 
predecessors have finished doing so. 
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Our approach is familiar from the context of module systems or 
object-oriented programming. We control the anarchy of naming 
by introducing hierarchical names. 

type Name = Stack (String, Int) 

We can use hierarchical names to reflect the hierarchy of tasks. We 
ensure that each subtask has a distinct prefix from which to form 
its names by extension. This directly rules out the possibility that 
different subtasks might choose the same name by accident and al- 
lows them to choose fresh names asynchronously. The remaining 
obligation — to ensure that each subtask makes distinct choices for 
the names under its own control — is easily discharged. 

Superiority within the hierarchy of names is just the partial order 
induced by 'being a prefix' : 



infixl6// 

(//) :: Name — ► String — » Name 

root lis — root :< (s,Q) 

nm :: String — » Name 
nm.v = Empty//,? 

Our scheme of naming thus localizes choice of fresh names, mak- 
ing it easy to manage, even in recursive constructions. We only 
need a global name generator when printing de Bruijn syntax in 
user-legible form, and even then only to provide names which cor- 
respond closely to those for which the user has indicated a prefer- 
ence. 

We shall develop our operations in the form of agencies. 



xs )p (xs <\-ys) 



type Agency agentT = Name — * agentT 



infixl4 <+ 

(<+) :: Stackx — ► Stack* — ► Stack* 
xs <+ Empty =xs 

xs <+ (ys :< y) = xs O-ys :< y 



That is an Agency agentT takes a 'root' name to an agent of type 
agentT with that name. 

You've already seen an agency — the under-binding navigator, 
which may be retyped 



We say that two names are independent, xs _L ys, if neither xs fc= ys 
nor ys )p xs. Two independent names must differ at some leftmost 
point in the stack: whatever extensions we make of them, they will 
still differ at that point in the stack. 



infix < — 

(< — ) :: MonadPlusm 
Agency (Expr 



m (Binding, Expr)) 



xs _L ys — ► (xs o- xs') _L (ys <+>'/) 

In order to work correctly with hierarchical names, the remaining 
idea we need is to name the agents which carry out the tasks, as well 
as the free variables. Each agent must choose independent names 
not only for the free variables it creates, but also for the sub-agents 
it calls: this is readily accomplished by ensuring that every agent 
only ever chooses names which strictly and independently extend 
its own 'root' name. This ensures that the naming hierarchy of 
reflects the call-hierarchy of agents. 



roofs variables: 

root :< ("x",0), root :< ("x",m), 
root :< ("y",0), root :< ("y",«), 



root < 



root's agents: 



root :< ("a",0) < 



root :< ("a",&) • 



(root :< ("a", 0))'s variables: 
root :< ("a",0) :< ("x",0), . 
(root :< ("a", 0))'s agents: 
root :< ("a",0) :< ("a",0), . 

(root :< ("a", k))'s variables: 
root :< ("a",k) :< ("x",0), .. 
(root :< ("a", k))'s agents: 
root :< ("a",ifc) :< ("a",0), . 



Note the convenience of (String, Int) as the type of name elements. 
The Strings give us legibility; the Ints an easy way to express uni- 
form sequences of distinct name-extensions xq,...x„. Two little 
helpers will make simple names easier to construct: 



That is, (root < — ) is the agent which binds root by decomposing a 
quantifier. Note that here the agent which creates the binding shares 
its name: the variable means 'the thing made by the agent', so this 
arrangement is quite convenient. It fits directly with our standard 
practice of using 'metavariables' to stand for the unknown parts of 
a construction, each associated with an agent trying to deduce its 
value. 

5 A Higher-Level Construction Kit 

Let's now build higher-level tools for composing and decompos- 
ing expressions. Firstly, we'll have equipment for working with a 
quantifier prefix, rather than individual bindings — here is the oper- 
ator which discharges a prefix over an expression, iterating — >. 

type Prefix = StackBinding 



infixr6 —*> 
(-»):: Prefix - 
Empty 

(binds :< bind) 



Expr — > 

» expr - 
» range - 



Expr 

= expr 
- binds 



bind 



range 



The corresponding destructor is an agency. Given a root and a string 
x, it delivers a quantifier prefix with names of the form root :< (x, J) 
where the 'subscript' _(' is numbered from 1: 

unpreflx :: Agency (String — ► Expr — ► (Prefix, Expr)) 
unpreflx root x expr = intro 1 (Empty, expr) where 
intro :: Int — > (Prefix, Expr) — ► (Prefix, Expr) 
intro J (binds, expr) = case (root :< (x,J)) < — expr of 
Just (bind, range) — ► intro (J+ 1) (binds :< bind, range) 
Nothing — ► (binds, expr) 
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Note that intro specifically exploits the Maybe instance of the 
monadically lifted binding agency (< — ). 

If root is independent of all the names in expr — which it will be, if 
we maintain our hierarchical discipline — and 

unpreflx root x expr = (binds , range) 

then range is unqualified and expr = binds — » range. 

A little example will show how these tools are used. Suppose we 
wish to implement the weakening agency, which inserts a new hy- 
pothesis y with a given domain into a quantified expression after 
all the old ones (x\,. .. ,x„). Here's how we do it safely and with 
names, not arithmetic. 

weaken :: Agency (Expr — » Expr — » Expr) 

weaken root dom expr = 

xdoms — » (root II "y" :£ dom) — > range 
where (xdoms, range) = unpreflx root "x" expr 

As ever, the independence of the root supplied to the agency is 
enough to ensure the freshness of the names chosen locally by the 
agent. 

We shall also need to build and decompose applications in terms 
of argument sequences, represented via [Expr]. First, we iterate :$, 
yielding $$. 

infixl9 SS 

($$) :: Expr — ► [Expr] — » Expr 

expr $$ [] = expr 

fun $$ (arg : args) =fun:$ arg $$> args 

Next, we build the destructor — this does not need to be an agency, 
as it binds no names: 

unapplies :: Expr — > (Expr, [Expr]) 
unapplies expr = peel (expr, []) where 

peel (fun :$arg, args) = peel (fun, arg : args) 

peel funargs =funargs 

Meaningful formulae in this particular language of expressions all 
fit the pattern Vxj : X\ . .. .V ' x m : X m .Re\ ... e„, where R is a vari- 
able. Of course, either the quantifier prefix or the argument se- 
quence or both may be empty — this pattern excludes only appli- 
cations of quantified formulae, and these are meaningless. Note 
that the same is not true of languages with /.-abstraction and P- 
redices, but here we may reasonably presume that the meaningless 
case never happens, and develop a one-stop analysis agency: 

data Analysis = ForAII Prefix Name [Expr] 

analysis :: Agency (String — > Expr — > Analysis) 
analysis root x expr = ForAII prefix f args where 

(prefix, range) = unpreflx root x expr 

(Ffargs) = unapplies range 

Again, the datatype Analysis is introduced only to make the appear- 
ance of the result suitably suggestive of its meaning, especially in 
patterns. 



The final piece of kit we shall define in this section delivers the ap- 
plication of a variable to a quantifier prefix — in practice, usually the 
very quantifier prefix over which it is abstracted, yielding a typical 
application of a functional object: 

infixl9 -$$ 

(— $$) :: Name — ► Prefix — ► Expr 
/ parameters = apply (Ff) parameters where 

apply expr Empty = expr 

apply fun (binds :< a :G _) = apply fun binds :$ Fa 

An example of this in action is the generalization functional. This 
takes a prefix and a binding, returning a transformed binding ab- 
stracted over the prefix, together with the function which updates 
expressions accordingly. 

generalize :: Prefix — ► Binding — ► (Binding, Expr — > Expr) 
generalize binds (name :<E expr) = 

(me :€ binds — » expr, substitute (name — $$ binds) name) 

Indeed, working in a X-calculus, these tools make it easy to imple- 
ment /-lifting [12], and also the 'raising' step in Miller's unification 
algorithm, working under a mixed prefix of existential and universal 
quantifiers [22]. 

6 Example — inductive elimination operators 
for datatype families 

We shall now use our tools to develop our example — constructing 
induction principles. To make things a little more challenging, and 
a little closer to home, let us consider the more general problem 
of constructing the inductive elimination operator for a datatype 
family [7], 

Datatype families are collections of sets defined not parametrically 
as in Hindley-Milner languages, but by mutual induction, indexed 
over other data. They are the cornerstone of our dependently typed 
programming language, EPIGRAM [19]. We present them by first 
declaring the type constructor, explaining the indexing structure, 
and then the data constructors, explaining how larger elements of 
types in the family are built from smaller ones. A common example 
is the family of vectors — lists indexed by element type and length. 
In EPIGRAM, we would write: 

data ( X * n:Na t \ 
\ VecXn : * J 

where f \ . I x:X;xs: VecXn \ 

yVnil : VecXZero J ' I Vconsxx.v : VecX(Sucn) J 

That is, the Vnil constructor only makes empty vectors, whilst Vcons 
extends length by exactly one. This definition would elaborate (by 
a process rather like Hindley-Milner type inference) to a series of 
more explicit declarations in a language rather like that which we 
study in this paper: 

Vec : MX € Set. Vrc € Nat. Set 
VnihVXe Set.VecXZero 
Vcons:VXe Set.Vrc € Nat.Vx e X.Vxs € VecXn. VecX(Sucn) 

The elimination operator for vectors takes three kinds of arguments: 
first, the targets — the vector to be eliminated, preceded by the in- 
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dices of its type; second, the motive? explaining what is to be 
achieved by the elimination; and third, the methods, explaining how 
the motive is to be pursued for each constructor in turn. Here it is, 
made fully explicit: 



Vec-Ind e 

VX e Set. ] 
Vrc € Nat. > 
Vxs e VecXn. J 

VFe VXe Set. Vn e Nat. Vxs e VecXn. Set. 
Vm„ € VXe Set.PXZero(VnilX). 
Vm c e VXe Set. Vn e Nat. Vx e X. 

Vxs e VecXn.Mh € PXnxs. 
PX(SucX) (VconsXnxxs). 

PXnxs 



targets 



motive 



methods 



It is not hard to appreciate that constructing such expressions us- 
ing only strings for variables provides a legion of opportunities for 
unlawful capture and abuse. On the other hand, the arithmetic in- 
volved in a purely de Bruijn indexed construction is truly terrifying. 
But with our tools, the construction is straightforward and safe.. 

To simplify the exposition, we shall presume that the declaration of 
the family takes the form of a binding for the type constructor and 
a context of data constructors which have already been checked for 
validity, say, according to the schema given by Luo [15] — checking 
as we go just requires a little extra work and a shift to an appropriate 
monad. Luo's schema is a sound (but by no means complete) set 
of syntactic conditions on family declarations which guarantee the 
existence of a semantically meaningful induction principle. The 
relevant conditions and the corresponding constructions are 

1. The type constructor is typed as follows 

F : Vi'i :I X . ...Vi„ :7„.Set 

Correspondingly, the target prefix is W : I. Vx : F7, and the 
motive has type P : VT : I. Vx : F7. Set. 

2. Each constructor has type 

c : V«i :A\. ...\/a m . :A m .Fs\ ...s n 

where the 1 do not mention F. The corresponding method has 
type 

\fa:A.\fh:H.Ps(ca) 

where the H are the inductive hypotheses, specified as follows. 

3. Non-recursive constructor arguments a : A do not mention F 
in A and contribute no inductive hypothesis. 

4. Recursive constructor arguments have form 

a:Vy, :>,....:>, :)..F/ : 

where F is not mentioned 3 in the Y or the r. The corresponding 
inductive hypothesis is 



2 We prefer 'motive' [17] to 'induction predicate', because a mo- 
tive need not be a predicate (i.e., a constructor of propositions) nor 
need an elimination operator be inductive. 

3 This condition is known as strict positivity. 



h :Vy:Y.Pr(ay) 

Observe that condition 4 allows for the inclusion of higher-order 
recursive arguments, parametrized by some y : Y. These support 
structures containing infinitary data, such as 

data InfTree : * where Leaf : InfTree 

Node : (Nat — > InfTree) — > InfTree 

We neglected to include these structures in our paper presentation 
of EPIGRAM [19] because they would have reduced our light-to- 
heat ratio for no profit — we gave no examples which involved them. 
However, as you shall shortly see, they do not complicate the imple- 
mentation in the slightest — the corresponding inductive hypothesis 
is parametrized by the same prefix y : Y. 

Our agency for inductive elimination operators follows Luo's recipe 
directly. The basic outline is as follows: 

makelndElim :: Agency (Binding — ► Prefix — ► Binding) 
makelndElim root (family :€ famtype) constructors = 
root :€ targets — » 
motive — > 

fmap method constructors — » 
bName motive — $$ targets 

where — constructions from condition 1 
ForAII indices set[] = 

analysis root "i" famtype 
targets = indices :< 

root/fx" :€ family —%indices 
motive = root//"?" :<E targets 

F(nm"Set") 
method :: Binding — » Binding 



As we have seen before, makelndElim is an agency which con- 
structs a binding — the intended name of the elimination operator 
is used as the name of the agent. The analysis function readily 
extracts the indices from the type of the family (we presume that 
this ranges over Set). From here, we can make the type of an el- 
ement with those indices, and hence compute the prefix of targets 
over which the motive is abstracted. Presuming we can build an 
appropriate method for each constructor, we can now assemble our 
induction principle. 

But how do we build a method for a constructor? Let us implement 
the constructions corresponding to condition 2. 

method :: Binding — ► Binding 

method (con :<E contype) = 
meth :€ conargs 

(indhyp =« conargs) 

bVar motive $$ conindices :$ (con — $$ conargs) 
where 

meth — root //"m" o- con 
ForAII conargs font conindices = 

analysis meth "a" contype 
indhyp :: Binding — ► Prefix 



The method's type says that the motive should hold for those targets 
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which can possibly be built by the constructor, given the construc- 
tor's arguments, together with inductive hypotheses for those of its 
arguments which happen to be recursive. We can easily combine the 
hypothesis constructions for non-recursive and recursive arguments 
(3 and 4, above) by making Stack an instance of the MonadPlus 
class in exactly the same 'list of successes' style as we have for 
ordinary lists [25]. The non-recursive constructor arguments give 
rise to an empty Prefix (= Stack Binding) of inductive hypothesis 
bindings. 

indhyp :: Binding — » Prefix 
indhyp(arg :G argtype) = do 

guard (argfam=family) — no hyp if arg non-recursive 
return (arg//"h" :€ argargs — » 

bVar motive $$ argindices 

:$ (arg% argargs)) 
where ForAII argargs argfam argindices = 
analysis meth "y" argtype 

With this, our construction is complete. 

Epilogue 

In this paper, we have shown how to manipulate syntax with bind- 
ing using a mixed representation of names for free variables (with 
respect to the task in hand) and de Bruijn indices [5] for bound 
variables. By doing so, we retain the advantages of both repre- 
sentations: naming supports easy, arithmetic-free manipulation of 
terms; de Bruijn indices eliminate the need for oc-conversion. Fur- 
ther, we have ensured that not only the user but also the implemen- 
tation need never deal with de Bruijn indices, except within key 
basic operations such as abstract and instantiate. 

Moreover, we have chosen a representation for names which read- 
ily supports a power structure naturally reflecting the structure of 
agents within the implementation. Name choice is safe and straight- 
forward. Our technology combines easily with an approach to syn- 
tax manipulation inspired by Huet's 'zippers'[10]. 

Of course, it takes some effort to ensure that name-roots are prop- 
agated correctly through the call hierarchy of a large system. We 
can manage the details of this in practice by working within an ap- 
propriate monad. The monad which we use also manages the book- 
keeping for the recursive solution of metavariables by expressions 
in terms of other metavariables (whose names are extensions of the 
original) — this process is beyond the scope of this paper. 

Without the ideas in this paper (amongst many others) it would 
have been much more difficult to implement EPIGRAM [18]. Our 
example — constructing inductive elimination operators for datatype 
families — is but one of many where it proves invaluable. Others in- 
deed include A.-lifting [12] and Miller-style unification [22]. 

More particularly, this technology evolved from our struggle to im- 
plement the 'elimination with a motive' approach [17], central to 
the elaboration of EPIGRAM programs into Type Theory. This 
transforms a problem containing a specific instance of a datatype 
family 

VJ:S.Vx: Fl.T 

into an equivalent problem which is immediately susceptible to 
elimination with operators like those constructed in our example. 



VT-.l.W : Ft 
V5:5.Vx: Ft.T. 
7 = 7 — > x 1 = x — > 
T 



Moreover, EPIGRAM source code is edited and elaborated into an 
underlying type theory incrementally, in no fixed order and with 
considerable dependency between components. The elaboration 
process is, in effect, code-driven tactical theorem-proving working 
on multiple interrelated problems simultaneously. Our principled 
approach to manipulating abstract syntax within multiple agents 
provides the key discipline we need in order to manage this process 
easily. We simply could not afford to leave these issues unanalysed. 

Whatever the syntax you may find yourself manipulating, and 
whether or not it involves dependent types, the techniques we have 
illustrated provide one way to make the job easier. By making com- 
puters using names the way people do, we hope you can accomplish 
such tasks straightforwardly, and without becoming a prisoner of 
numbers. 
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